Polymorphs (and out-of-date virus definitions)
Introduction
In this article I will sink into the shadowy world that most users are all too aware of: viruses, or to be more specific polymorphic viruses. I know this is an unpleasant topic which causes nervous mutterings amongst coders and users alike. I've had second (and third) thoughts about whether to actually write this article, but as someone once said:
"Knowledge is power"
(especially if you know how to mend a broken fuse ;)
This is definitely NOT a "how to code a virus" guide or a pat on the back for all those destructive virus writers out there, instead it's a brief peek into the underworldly realms of binary vandals and the commerical profit machine.
Polymorphic?
What's a polymuppet? (Hmm... I believe it's a virtual puppet made up from vertices and triangles for Sesame Street.. hehe.)
Seriously, a polymorph is something which has the ability to change its appearance. In the case of programming it's usually a virus, but the technique could be useful to protect your own productions or simply as an exercise for your imagination and coding skills.
It basically means an encryption scheme in which the true patterns of virus instructions are masked, hidden from both virus scanners and the wanna-be hacker newbie. Most virus scanners (such as Norton Anti-virus, Dr. Solomon, McAfee etc...) use small sequences of 80x86 instruction bytes to 'assess' the properties and function of the program in question. For example if a program changes certain interrupt vectors, attempts to duplicate itself or makes itself resident in memory then perhaps it is a virus.
Please note I said 'perhaps' because of course real (non-virus) programs could also have a true, legitimate reason for changing interrupt vectors and so on...
The whole idea behind a polymorphic virus is to conceal all of the common code fragments so that an anti-virus program can not use them as markers to identify an infected program. A fundamental encoding method is the famous XOR.
mov di, offset decode ; [ES:DI] --> encoded block
mov cx, 100 ; decode 100 words
mov ax, 0DEADh ; 'encryption key'
again:
xor ax, ES:[di] ; \ decode 1 word
add ax, cx ; /
stosw ; store decoded word
loop again
decode:
<-- encoded code goes here -->
The above is a really, really simple example of a decoder loop. The main 'payload' (body of the virus) instructions would be encoded using a similiar routine so to disguise the payload from a virus-scanner.
Sooner or later the virus-scanner would be updated to recognize the above code fragment (and possibly some of the encoded instruction block).
Polymorphic
A polymorphic virus goes beyond the encryption stage. You can tell from its name that it's 'morphic' in behaviour, which means it has the ability to change itself with each new generation. A really basic example of this morphic technique would be to generate a new encryption-key so that the actual encoded payload has a new disguise. To modify the previous example you only need a single MOV instruction.
EncKey: mov ax, 0DEADh ; 'encryption key'
... code/payload ...
call RandomNumber
mov word ptr [EncKey+1], ax
Encapsulated Polymorphic
You may have thought that generating a new random encryption key to encode to main payload would have made the virus totally impossible to detect using the normal sequence matching technique(s), but in fact there is an obvious flaw.
The actual payload decoder (the XOR) loop is static and so those 17 or so bytes of code could be detected using nothing more than a REP CMPSB. To plug this loop-hole the decoder needs to be encrypted so that those 17 odd bytes are randomly scrambled too.
But now we have a paradox, the decoder needs to be encoded using a polymorphic method, but if the decoder is encrypted then it will simply crash when run by the operating-system. Those 17 odd instruction bytes MUST be safe, valid and executable code.
But how?
I can remember hearing about polymorphic viruses a long time ago and thinking that a true polymorphic virus was impossible. Staying completely true to my nature... I was, of course, completely wrong. ;)
Here is a simple analogy which I used to uncover my own stupidity.
1) You have a magic key, a locksmith, some paint and a trunk.
2) The locksmith hides in the trunk leaving the magic key outside.
In the case of a virus the key is the decoder, the trunk is the payload and the locksmith is the key generator (and some other, new, tricks).
3) The magic key opens the trunk and the locksmith climbs out (the payload has been decoded).
4) The locksmith uses the paint to change the appearance of the trunk (i.e. encode the payload using a new encryption method).
5) The locksmith now engineers a completely new magic key (this is the new trick - the 'magic key' routine is built from a number of instructions which the locksmith has hidden in the trunk).
6) The locksmith gets back into the trunk and hides.
And no article would be complete without a dodgy looking ASCII...
the trunk
019247329047902714093
3ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿2
5³ ³4 magic key
2³ locksmith ³532
3³ ÚÄÁÄ¿4 ÚÄÄ¿
4³ payload ³ ³ ³6 ÂÂÂÄÄÄ´ ³
5³ ³ ³ ³4 ³ ÀÄÄÙ
7³ ÀÄÂÄÙ2
8³ new key parts ³805
2³ ³2
1ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ1
938674360102387620112
Having a locksmith in the trunk (with the new key parts) means that the magic key (the decoder) can be polymorphic too by using a variety of decoding parts. In the case of a virus this could be as simple as randomly choosing between a XOR, ADD, NOT, ROL or ROR instruction. The locksmith could also place random dummy instructions in the decoding loop too to try and defeat the anti-virus programs. As you know by now there are millions of different way to perform the same task. In the case of a LOOP you could use:
dec cx
jnz again
or,
sub cx, 1
nop
jg again
(...insert your own version(s) here...)
Closing words
As you can hopefully see from the locksmith example it allows the entire trunk, payload and magic key to all be reshaped without the loop-hole of the static, 17 odd byte example while still generating executable code.
Looking into the murky pond of viruses should be of interest to all coders, not just those who want to write a polymorphic virus. The code generation of the 'locksmith' is rife for code optimization and a whole heap of other forms of research. In fact there already is, genetic algorithms..
Oh well, I've probably opened a can of worms with this article. Some may criticise my stupidity "Why the hell are you helping virus coders to create new viruses?" but I honestly did NOT intend this to be a "how to code a virus" tut. For a start any coder must first need to understand the complexitites of writing a virus and be skilled enough to develop the above polymorphic idea into a working program, so they probably already have the knowledge/skill to produce a polymorph already. Besides, there are other techniques which any half decent anti-virus program should have to combat viruses (such as system monitoring and file size/contents checks).
Let's end with a bit of Latin:
ars est celare artem
(true art is to conceal art).